Tolerating Branch Predictor Latency on SMT
نویسندگان
چکیده
Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with the branch predictor delay on SMT. Our contribution is two-fold: we describe a decoupled implementation of the SMT fetch unit, and we propose an interthread pipelined branch predictor implementation. These techniques prove to be effective for tolerating the branch predictor access latency. keywords: SMT, branch predictor delay, decoupled fetch, predictor pipelining.
منابع مشابه
A latency-conscious SMT branch prediction architecture
Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because a long-latency operation is being processed, like a memory access or a floatingpoint calculation, the processor can switch to another context so that another thread can take advantage of the idle resources. However, fetch stall conditions cau...
متن کاملAn Effective Bypass Mechanism to Enhance Branch Predictor for SMT Processors
Unlike traditional superscalar processors, Simultaneous Multithreaded processor can explore both instruction level parallelism and thread level parallelism at the same time. With a same fetch width, SMT fetches instructions from a single thread not so deeply as in traditional superscalar processor. Meanwhile, all the instructions from different threads share the same Function Unites in SMT. All...
متن کاملEvaluating Branch Predictors on an SMT Processor
Simultaneous multithreading (SMT) provides significant increases in microprocessor throughput by issuing instructions from multiple threads per clock cycle. SMT can be realized in a wide-issue superscalar with a modest increase in resources, because much of the hardware is shared among the multiple thread contexts. Branch prediction accuracy, a key component of microprocessor performance, can s...
متن کاملNeural Branch Prediction
The new neural predictor improves accuracy by combining path and pattern history to overcome limitation inherent to previous predictors. It uses a different prediction algorithm that would allow parallel execution of instructions during every prediction, thereby keeping the latency low. In fact, the fast path-based neural predictor has a latency comparable to the predictors from industrial desi...
متن کاملReconsidering Complex Branch Predictors
To sustain instruction throughput rates in more aggressively clocked microarchitectures, microarchitects have incorporated larger and more complex branch predictors into their designs, taking advantage of the increasing numbers of transistors available on a chip. Unfortunately, because of penalties associated with their implementations, the extra accuracy provided by many branch predictors does...
متن کامل